This commit is contained in:
Alexander Stefanov 2020-03-06 14:55:00 +00:00
parent d7306bbbac
commit 9e27e7a84c
11 changed files with 1161 additions and 9 deletions

View file

@ -1,4 +1,2 @@
removed_sources:
pcre2-10.22.tar.bz2: 3be3891e1cb1caaa31fa89db51d015831f8f8089
sources:
pcre2-10.32.tar.bz2: 31dea762ff549cda09b7df33648f9d4cc3707cf8
pcre2-10.34.zip: b582e8667d8a57480d4848a56fdfd6b34f67dfad

View file

@ -0,0 +1,598 @@
From b3f42a32920b20ae71988bc1d06a7148e0211925 Mon Sep 17 00:00:00 2001
From: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Sat, 25 Jan 2020 15:50:44 +0000
Subject: [PATCH] Ensure a newline after the final line in a file is output by
pcre2grep.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1211 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
RunGrepTest | 4 +-
doc/html/pcre2grep.html | 84 ++++++++++++++++++++-------------
doc/pcre2grep.1 | 83 ++++++++++++++++++++-------------
doc/pcre2grep.txt | 100 ++++++++++++++++++++++++----------------
src/pcre2grep.c | 66 ++++++++++++++++++++++++--
testdata/grepoutputN | 16 ++++---
diff --git a/RunGrepTest b/RunGrepTest
index 1113cd4..2ff4f7c 100755
--- a/RunGrepTest
+++ b/RunGrepTest
@@ -742,11 +742,11 @@ uname=`uname`
case $uname in
Linux)
printf 'abc\0def' >testNinputgrep
- $valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/' >>testtrygrep
+ $valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/g' >>testtrygrep
echo "" >>testtrygrep
;;
*)
- echo '1:abcZERO2:def' >>testtrygrep
+ echo '1:abcZERO2:defZERO' >>testtrygrep
;;
esac
diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html
index f5b72f3..abbafa1 100644
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@@ -148,7 +148,7 @@ ignored.
By default, a file that contains a binary zero byte within the first 1024 bytes
is identified as a binary file, and is processed specially. (GNU grep
identifies binary files in this manner.) However, if the newline type is
-specified as "nul", that is, the line terminator is a binary zero, the test for
+specified as NUL, that is, the line terminator is a binary zero, the test for
a binary file is not applied. See the <b>--binary-files</b> option for a means
of changing the way binary files are handled.
</P>
@@ -601,25 +601,32 @@ does not work when input is read line by line (see \fP--line-buffered\fP.)
</P>
<P>
<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
-The PCRE2 library supports five different conventions for indicating
-the ends of lines. They are the single-character sequences CR (carriage return)
-and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
-which recognizes any of the preceding three types, and an "any" convention, in
-which any Unicode line ending sequence is assumed to end a line. The Unicode
-sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
-(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
-PS (paragraph separator, U+2029).
+Six different conventions for indicating the ends of lines in scanned files are
+supported. For example:
+<pre>
+ pcre2grep -N CRLF 'some pattern' &#60;file&#62;
+</pre>
+The newline type may be specified in upper, lower, or mixed case. If the
+newline type is NUL, lines are separated by binary zero characters. The other
+types are the single-character sequences CR (carriage return) and LF
+(linefeed), the two-character sequence CRLF, an "anycrlf" type, which
+recognizes any of the preceding three types, and an "any" type, for which any
+Unicode line ending sequence is assumed to end a line. The Unicode sequences
+are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed,
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029).
<br>
<br>
When the PCRE2 library is built, a default line-ending sequence is specified.
This is normally the standard sequence for the operating system. Unless
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
-The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
-makes it possible to use <b>pcre2grep</b> to scan files that have come from
-other environments without having to modify their line endings. If the data
-that is being scanned does not agree with the convention set by this option,
-<b>pcre2grep</b> may behave in strange ways. Note that this option does not
-apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
+<br>
+<br>
+This option makes it possible to use <b>pcre2grep</b> to scan files that have
+come from other environments without having to modify their line endings. If
+the data that is being scanned does not agree with the convention set by this
+option, <b>pcre2grep</b> may behave in strange ways. Note that this option does
+not apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
<b>--include-from</b> options, which are expected to use the operating system's
standard newline sequence.
</P>
@@ -640,12 +647,14 @@ use of JIT at run time. It is provided for testing and working round problems.
It should never be needed in normal use.
</P>
<P>
-<b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
+<b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
When there is a match, instead of outputting the whole line that matched,
-output just the given text. This option is mutually exclusive with
-<b>--only-matching</b>, <b>--file-offsets</b>, and <b>--line-offsets</b>. Escape
-sequences starting with a dollar character may be used to insert the contents
-of the matched part of the line and/or captured substrings into the text.
+output just the given text, followed by an operating-system standard newline.
+The <b>--newline</b> option has no effect on this option, which is mutually
+exclusive with <b>--only-matching</b>, <b>--file-offsets</b>, and
+<b>--line-offsets</b>. Escape sequences starting with a dollar character may be
+used to insert the contents of the matched part of the line and/or captured
+substrings into the text.
<br>
<br>
$&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the captured
@@ -807,16 +816,27 @@ by the <b>--locale</b> option. If no locale is set, the PCRE2 library's default
<br><a name="SEC8" href="#TOC1">NEWLINES</a><br>
<P>
The <b>-N</b> (<b>--newline</b>) option allows <b>pcre2grep</b> to scan files with
-different newline conventions from the default. Any parts of the input files
-that are written to the standard output are copied identically, with whatever
-newline sequences they have in the input. However, the setting of this option
-affects only the way scanned files are processed. It does not affect the
-interpretation of files specified by the <b>-f</b>, <b>--file-list</b>,
-<b>--exclude-from</b>, or <b>--include-from</b> options, nor does it affect the
-way in which <b>pcre2grep</b> writes informational messages to the standard
-error and output streams. For these it uses the string "\n" to indicate
-newlines, relying on the C I/O library to convert this to an appropriate
-sequence.
+newline conventions that differ from the default. This option affects only the
+way scanned files are processed. It does not affect the interpretation of files
+specified by the <b>-f</b>, <b>--file-list</b>, <b>--exclude-from</b>, or
+<b>--include-from</b> options.
+</P>
+<P>
+Any parts of the scanned input files that are written to the standard output
+are copied with whatever newline sequences they have in the input. However, if
+the final line of a file is output, and it does not end with a newline
+sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF
+or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a
+single NL is used.
+</P>
+<P>
+The newline setting does not affect the way in which <b>pcre2grep</b> writes
+newlines in informational messages to the standard output and error streams.
+Under Windows, the standard output is set to be binary, so that "\r\n" at the
+ends of output lines that are copied from the input is not converted to
+"\r\r\n" by the C I/O library. This means that any messages written to the
+standard output must end with "\r\n". For all other operating systems, and
+for all messages to the standard error stream, "\n" is used.
</P>
<br><a name="SEC9" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
<P>
@@ -992,9 +1012,9 @@ Cambridge, England.
</P>
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 15 June 2019
+Last updated: 25 January 2020
<br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/pcre2grep.1 b/doc/pcre2grep.1
index 22992b1..82f0435 100644
--- a/doc/pcre2grep.1
+++ b/doc/pcre2grep.1
@@ -1,4 +1,4 @@
-.TH PCRE2GREP 1 "15 June 2019" "PCRE2 10.34"
+.TH PCRE2GREP 1 "25 January 2020" "PCRE2 10.35"
.SH NAME
pcre2grep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -117,7 +117,7 @@ ignored.
By default, a file that contains a binary zero byte within the first 1024 bytes
is identified as a binary file, and is processed specially. (GNU grep
identifies binary files in this manner.) However, if the newline type is
-specified as "nul", that is, the line terminator is a binary zero, the test for
+specified as NUL, that is, the line terminator is a binary zero, the test for
a binary file is not applied. See the \fB--binary-files\fP option for a means
of changing the way binary files are handled.
.
@@ -523,24 +523,30 @@ large processing buffer, this should not be a problem, but the \fB-M\fP option
does not work when input is read line by line (see \fP--line-buffered\fP.)
.TP
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
-The PCRE2 library supports five different conventions for indicating
-the ends of lines. They are the single-character sequences CR (carriage return)
-and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
-which recognizes any of the preceding three types, and an "any" convention, in
-which any Unicode line ending sequence is assumed to end a line. The Unicode
-sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
-(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
-PS (paragraph separator, U+2029).
+Six different conventions for indicating the ends of lines in scanned files are
+supported. For example:
+.sp
+ pcre2grep -N CRLF 'some pattern' <file>
+.sp
+The newline type may be specified in upper, lower, or mixed case. If the
+newline type is NUL, lines are separated by binary zero characters. The other
+types are the single-character sequences CR (carriage return) and LF
+(linefeed), the two-character sequence CRLF, an "anycrlf" type, which
+recognizes any of the preceding three types, and an "any" type, for which any
+Unicode line ending sequence is assumed to end a line. The Unicode sequences
+are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed,
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029).
.sp
When the PCRE2 library is built, a default line-ending sequence is specified.
This is normally the standard sequence for the operating system. Unless
otherwise specified by this option, \fBpcre2grep\fP uses the library's default.
-The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
-makes it possible to use \fBpcre2grep\fP to scan files that have come from
-other environments without having to modify their line endings. If the data
-that is being scanned does not agree with the convention set by this option,
-\fBpcre2grep\fP may behave in strange ways. Note that this option does not
-apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
+.sp
+This option makes it possible to use \fBpcre2grep\fP to scan files that have
+come from other environments without having to modify their line endings. If
+the data that is being scanned does not agree with the convention set by this
+option, \fBpcre2grep\fP may behave in strange ways. Note that this option does
+not apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
\fB--include-from\fP options, which are expected to use the operating system's
standard newline sequence.
.TP
@@ -558,12 +564,14 @@ was explicitly disabled at build time. This option can be used to disable the
use of JIT at run time. It is provided for testing and working round problems.
It should never be needed in normal use.
.TP
-\fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP
+\fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP
When there is a match, instead of outputting the whole line that matched,
-output just the given text. This option is mutually exclusive with
-\fB--only-matching\fP, \fB--file-offsets\fP, and \fB--line-offsets\fP. Escape
-sequences starting with a dollar character may be used to insert the contents
-of the matched part of the line and/or captured substrings into the text.
+output just the given text, followed by an operating-system standard newline.
+The \fB--newline\fP option has no effect on this option, which is mutually
+exclusive with \fB--only-matching\fP, \fB--file-offsets\fP, and
+\fB--line-offsets\fP. Escape sequences starting with a dollar character may be
+used to insert the contents of the matched part of the line and/or captured
+substrings into the text.
.sp
$<digits> or ${<digits>} is replaced by the captured
substring of the given decimal number; zero substitutes the whole match. If
@@ -709,16 +717,25 @@ by the \fB--locale\fP option. If no locale is set, the PCRE2 library's default
.rs
.sp
The \fB-N\fP (\fB--newline\fP) option allows \fBpcre2grep\fP to scan files with
-different newline conventions from the default. Any parts of the input files
-that are written to the standard output are copied identically, with whatever
-newline sequences they have in the input. However, the setting of this option
-affects only the way scanned files are processed. It does not affect the
-interpretation of files specified by the \fB-f\fP, \fB--file-list\fP,
-\fB--exclude-from\fP, or \fB--include-from\fP options, nor does it affect the
-way in which \fBpcre2grep\fP writes informational messages to the standard
-error and output streams. For these it uses the string "\en" to indicate
-newlines, relying on the C I/O library to convert this to an appropriate
-sequence.
+newline conventions that differ from the default. This option affects only the
+way scanned files are processed. It does not affect the interpretation of files
+specified by the \fB-f\fP, \fB--file-list\fP, \fB--exclude-from\fP, or
+\fB--include-from\fP options.
+.P
+Any parts of the scanned input files that are written to the standard output
+are copied with whatever newline sequences they have in the input. However, if
+the final line of a file is output, and it does not end with a newline
+sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF
+or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a
+single NL is used.
+.P
+The newline setting does not affect the way in which \fBpcre2grep\fP writes
+newlines in informational messages to the standard output and error streams.
+Under Windows, the standard output is set to be binary, so that "\er\en" at the
+ends of output lines that are copied from the input is not converted to
+"\er\er\en" by the C I/O library. This means that any messages written to the
+standard output must end with "\er\en". For all other operating systems, and
+for all messages to the standard error stream, "\en" is used.
.
.
.SH "OPTIONS COMPATIBILITY"
@@ -904,6 +921,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 15 June 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 25 January 2020
+Copyright (c) 1997-2020 University of Cambridge.
.fi
diff --git a/doc/pcre2grep.txt b/doc/pcre2grep.txt
index b11092a..4d41f54 100644
--- a/doc/pcre2grep.txt
+++ b/doc/pcre2grep.txt
@@ -116,9 +116,9 @@ BINARY FILES
By default, a file that contains a binary zero byte within the first
1024 bytes is identified as a binary file, and is processed specially.
(GNU grep identifies binary files in this manner.) However, if the new-
- line type is specified as "nul", that is, the line terminator is a bi-
- nary zero, the test for a binary file is not applied. See the --binary-
- files option for a means of changing the way binary files are handled.
+ line type is specified as NUL, that is, the line terminator is a binary
+ zero, the test for a binary file is not applied. See the --binary-files
+ option for a means of changing the way binary files are handled.
BINARY ZEROS IN PATTERNS
@@ -578,30 +578,36 @@ OPTIONS
when input is read line by line (see --line-buffered.)
-N newline-type, --newline=newline-type
- The PCRE2 library supports five different conventions for in-
- dicating the ends of lines. They are the single-character se-
- quences CR (carriage return) and LF (linefeed), the two-char-
- acter sequence CRLF, an "anycrlf" convention, which recog-
- nizes any of the preceding three types, and an "any" conven-
- tion, in which any Unicode line ending sequence is assumed to
- end a line. The Unicode sequences are the three just men-
- tioned, plus VT (vertical tab, U+000B), FF (form feed,
- U+000C), NEL (next line, U+0085), LS (line separator,
- U+2028), and PS (paragraph separator, U+2029).
+ Six different conventions for indicating the ends of lines in
+ scanned files are supported. For example:
+
+ pcre2grep -N CRLF 'some pattern' <file>
+
+ The newline type may be specified in upper, lower, or mixed
+ case. If the newline type is NUL, lines are separated by bi-
+ nary zero characters. The other types are the single-charac-
+ ter sequences CR (carriage return) and LF (linefeed), the
+ two-character sequence CRLF, an "anycrlf" type, which recog-
+ nizes any of the preceding three types, and an "any" type,
+ for which any Unicode line ending sequence is assumed to end
+ a line. The Unicode sequences are the three just mentioned,
+ plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL
+ (next line, U+0085), LS (line separator, U+2028), and PS
+ (paragraph separator, U+2029).
When the PCRE2 library is built, a default line-ending se-
quence is specified. This is normally the standard sequence
for the operating system. Unless otherwise specified by this
- option, pcre2grep uses the library's default. The possible
- values for this option are CR, LF, CRLF, ANYCRLF, or ANY.
- This makes it possible to use pcre2grep to scan files that
- have come from other environments without having to modify
- their line endings. If the data that is being scanned does
- not agree with the convention set by this option, pcre2grep
- may behave in strange ways. Note that this option does not
- apply to files specified by the -f, --exclude-from, or --in-
- clude-from options, which are expected to use the operating
- system's standard newline sequence.
+ option, pcre2grep uses the library's default.
+
+ This option makes it possible to use pcre2grep to scan files
+ that have come from other environments without having to mod-
+ ify their line endings. If the data that is being scanned
+ does not agree with the convention set by this option,
+ pcre2grep may behave in strange ways. Note that this option
+ does not apply to files specified by the -f, --exclude-from,
+ or --include-from options, which are expected to use the op-
+ erating system's standard newline sequence.
-n, --line-number
Precede each output line by its line number in the file, fol-
@@ -620,11 +626,13 @@ OPTIONS
-O text, --output=text
When there is a match, instead of outputting the whole line
- that matched, output just the given text. This option is mu-
- tually exclusive with --only-matching, --file-offsets, and
- --line-offsets. Escape sequences starting with a dollar char-
- acter may be used to insert the contents of the matched part
- of the line and/or captured substrings into the text.
+ that matched, output just the given text, followed by an op-
+ erating-system standard newline. The --newline option has no
+ effect on this option, which is mutually exclusive with
+ --only-matching, --file-offsets, and --line-offsets. Escape
+ sequences starting with a dollar character may be used to in-
+ sert the contents of the matched part of the line and/or cap-
+ tured substrings into the text.
$<digits> or ${<digits>} is replaced by the captured sub-
string of the given decimal number; zero substitutes the
@@ -780,17 +788,27 @@ ENVIRONMENT VARIABLES
NEWLINES
- The -N (--newline) option allows pcre2grep to scan files with different
- newline conventions from the default. Any parts of the input files that
- are written to the standard output are copied identically, with what-
- ever newline sequences they have in the input. However, the setting of
- this option affects only the way scanned files are processed. It does
- not affect the interpretation of files specified by the -f, --file-
- list, --exclude-from, or --include-from options, nor does it affect the
- way in which pcre2grep writes informational messages to the standard
- error and output streams. For these it uses the string "\n" to indicate
- newlines, relying on the C I/O library to convert this to an appropri-
- ate sequence.
+ The -N (--newline) option allows pcre2grep to scan files with newline
+ conventions that differ from the default. This option affects only the
+ way scanned files are processed. It does not affect the interpretation
+ of files specified by the -f, --file-list, --exclude-from, or --in-
+ clude-from options.
+
+ Any parts of the scanned input files that are written to the standard
+ output are copied with whatever newline sequences they have in the in-
+ put. However, if the final line of a file is output, and it does not
+ end with a newline sequence, a newline sequence is added. If the new-
+ line setting is CR, LF, CRLF or NUL, that line ending is output; for
+ the other settings (ANYCRLF or ANY) a single NL is used.
+
+ The newline setting does not affect the way in which pcre2grep writes
+ newlines in informational messages to the standard output and error
+ streams. Under Windows, the standard output is set to be binary, so
+ that "\r\n" at the ends of output lines that are copied from the input
+ is not converted to "\r\r\n" by the C I/O library. This means that any
+ messages written to the standard output must end with "\r\n". For all
+ other operating systems, and for all messages to the standard error
+ stream, "\n" is used.
OPTIONS COMPATIBILITY
@@ -963,5 +981,5 @@ AUTHOR
REVISION
- Last updated: 15 June 2019
- Copyright (c) 1997-2019 University of Cambridge.
+ Last updated: 25 January 2020
+ Copyright (c) 1997-2020 University of Cambridge.
diff --git a/src/pcre2grep.c b/src/pcre2grep.c
index 12fe95e..10314a5 100644
--- a/src/pcre2grep.c
+++ b/src/pcre2grep.c
@@ -13,7 +13,7 @@ distribution because other apparatus is needed to compile pcre2grep for z/OS.
The header can be found in the special z/OS distribution, which is available
from www.zaconsultants.net or from www.cbttape.org.
- Copyright (c) 1997-2019 University of Cambridge
+ Copyright (c) 1997-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -1665,6 +1665,44 @@ switch(endlinetype)
+/*************************************************
+* Output newline at end *
+*************************************************/
+
+/* This function is called if the final line of a file has been written to
+stdout, but it does not have a terminating newline.
+
+Arguments: none
+Returns: nothing
+*/
+
+static void
+write_final_newline(void)
+{
+switch(endlinetype)
+ {
+ default: /* Just in case */
+ case PCRE2_NEWLINE_LF:
+ case PCRE2_NEWLINE_ANY:
+ case PCRE2_NEWLINE_ANYCRLF:
+ fprintf(stdout, "\n");
+ break;
+
+ case PCRE2_NEWLINE_CR:
+ fprintf(stdout, "\r");
+ break;
+
+ case PCRE2_NEWLINE_CRLF:
+ fprintf(stdout, "\r\n");
+ break;
+
+ case PCRE2_NEWLINE_NUL:
+ fprintf(stdout, "%c", 0);
+ break;
+ }
+}
+
+
/*************************************************
* Print the previous "after" lines *
*************************************************/
@@ -1689,9 +1727,9 @@ do_after_lines(unsigned long int lastmatchnumber, char *lastmatchrestart,
if (after_context > 0 && lastmatchnumber > 0)
{
int count = 0;
+ int ellength = 0;
while (lastmatchrestart < endptr && count < after_context)
{
- int ellength;
char *pp = end_of_line(lastmatchrestart, endptr, &ellength);
if (ellength == 0 && pp == main_buffer + bufsize) break;
if (printname != NULL) fprintf(stdout, "%s-", printname);
@@ -1700,7 +1738,17 @@ if (after_context > 0 && lastmatchnumber > 0)
lastmatchrestart = pp;
count++;
}
- if (count > 0) hyphenpending = TRUE;
+
+ /* If we have printed any lines, arrange for a hyphen separator if anything
+ else follows. Also, if the last line is the final line in the file and it had
+ no newline, add one. */
+
+ if (count > 0)
+ {
+ hyphenpending = TRUE;
+ if (ellength == 0 && lastmatchrestart >= endptr)
+ write_final_newline();
+ }
}
}
@@ -2437,6 +2485,7 @@ char *endptr;
PCRE2_SIZE bufflength;
BOOL binary = FALSE;
BOOL endhyphenpending = FALSE;
+BOOL lines_printed = FALSE;
BOOL input_line_buffered = line_buffered;
FILE *in = NULL; /* Ensure initialized */
@@ -2777,6 +2826,8 @@ while (ptr < endptr)
else
{
+ lines_printed = TRUE;
+
/* See if there is a requirement to print some "after" lines from a
previous match. We never print any overlaps. */
@@ -2825,7 +2876,8 @@ while (ptr < endptr)
int linecount = 0;
char *p = ptr;
- while (p > main_buffer && (lastmatchnumber == 0 || p > lastmatchrestart) &&
+ while (p > main_buffer &&
+ (lastmatchnumber == 0 || p > lastmatchrestart) &&
linecount < before_context)
{
linecount++;
@@ -2981,6 +3033,12 @@ while (ptr < endptr)
lastmatchrestart = ptr + linelength + endlinelength;
lastmatchnumber = linenumber + 1;
+
+ /* If a line was printed and we are now at the end of the file and the last
+ line had no newline, output one. */
+
+ if (lines_printed && lastmatchrestart >= endptr && endlinelength == 0)
+ write_final_newline();
}
/* For a match in multiline inverted mode (which of course did not cause
diff --git a/testdata/grepoutputN b/testdata/grepoutputN
index ba97e90..caaeb75 100644
--- a/testdata/grepoutputN
+++ b/testdata/grepoutputN
@@ -2,16 +2,20 @@
1:abc 2:def ---------------------------- Test N2 ------------------------------
1:abc def
2:ghi
-jkl---------------------------- Test N3 ------------------------------
+jkl
+---------------------------- Test N3 ------------------------------
2:def 3:
ghi
-jkl---------------------------- Test N4 ------------------------------
+jkl ---------------------------- Test N4 ------------------------------
2:ghi
-jkl---------------------------- Test N5 ------------------------------
+jkl
+---------------------------- Test N5 ------------------------------
1:abc 2:def
3:ghi
-4:jkl---------------------------- Test N6 ------------------------------
+4:jkl
+---------------------------- Test N6 ------------------------------
1:abc 2:def
3:ghi
-4:jkl---------------------------- Test N7 ------------------------------
-1:abcZERO2:def
+4:jkl
+---------------------------- Test N7 ------------------------------
+1:abcZERO2:defZERO
--
2.21.1

View file

@ -0,0 +1,43 @@
From 5e6a7641c60a1fcee8ae445be3511ce398c0baaa Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Sat, 11 Jan 2020 15:28:15 +0000
Subject: [PATCH] Fix *THEN verbs in lookahead assertions in JIT.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1204 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_compile.c | 3 ++-
src/pcre2_jit_test.c | 1 +
diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c
index 78b94c1..00d13f1 100644
--- a/src/pcre2_jit_compile.c
+++ b/src/pcre2_jit_compile.c
@@ -9597,7 +9597,8 @@ if (opcode == OP_ASSERT || opcode == OP_ASSERTBACK)
}
else
{
- OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), 0);
+ SLJIT_ASSERT(extrasize == 3);
+ OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(-1));
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), bra == OP_BRAZERO ? STR_PTR : SLJIT_IMM, 0);
}
}
diff --git a/src/pcre2_jit_test.c b/src/pcre2_jit_test.c
index e0638ef..a188724 100644
--- a/src/pcre2_jit_test.c
+++ b/src/pcre2_jit_test.c
@@ -860,6 +860,7 @@ static struct regression_test_case regression_test_cases[] = {
{ MU, A, 0, 0, "(?(?!a(*THEN)b)ad|add)", "add" },
{ MU, A, 0, 0 | F_NOMATCH, "(?(?=a)a(*THEN)b|ad)", "ad" },
{ MU, A, 0, 0, "(?!(?(?=a)ab|b(*THEN)d))bn|bnn", "bnn" },
+ { MU, A, 0, 0, "(?=(*THEN: ))* ", " " },
/* Recurse and control verbs. */
{ MU, A, 0, 0, "(a(*ACCEPT)b){0}a(?1)b", "aacaabb" },
--
2.21.1

View file

@ -0,0 +1,49 @@
From 5446ab8fa22b7e685c01cbfc5a673d2c7f994c93 Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Thu, 20 Feb 2020 07:42:47 +0000
Subject: [PATCH] Fix a crash which occurs when the character type of an
invalid UTF character is decoded in JIT.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1221 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_compile.c | 6 ++++++
src/pcre2_jit_test.c | 2 ++
diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c
index 10665a8..ef29a76 100644
--- a/src/pcre2_jit_compile.c
+++ b/src/pcre2_jit_compile.c
@@ -7224,7 +7224,13 @@ cc = ccbegin;
if ((cc[-1] & XCL_NOT) != 0)
read_char(common, min, max, backtracks, READ_CHAR_UPDATE_STR_PTR);
else
+ {
+#ifdef SUPPORT_UNICODE
+ read_char(common, min, max, (needstype || needsscript) ? backtracks : NULL, 0);
+#else /* !SUPPORT_UNICODE */
read_char(common, min, max, NULL, 0);
+#endif /* SUPPORT_UNICODE */
+ }
if ((cc[-1] & XCL_HASPROP) == 0)
{
diff --git a/src/pcre2_jit_test.c b/src/pcre2_jit_test.c
index 187e565..619e738 100644
--- a/src/pcre2_jit_test.c
+++ b/src/pcre2_jit_test.c
@@ -1965,6 +1965,8 @@ static struct invalid_utf8_regression_test_case invalid_utf8_regression_test_cas
{ PCRE2_UTF, CI, 0, 0, 0, 4, 8, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7#\xc7\x85#" },
{ PCRE2_UTF, CI, 0, 0, 0, 7, 11, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7\x80\x80\x80#\xc7\x85#" },
+ { PCRE2_UTF | PCRE2_UCP, CI, 0, 0, 0, -1, -1, { "[\\s]", NULL }, "\xed\xa0\x80" },
+
/* These two are not invalid UTF tests, but this infrastructure fits better for them. */
{ 0, PCRE2_JIT_COMPLETE, 0, 0, 1, -1, -1, { "\\X{2}", NULL }, "\r\n\n" },
{ 0, PCRE2_JIT_COMPLETE, 0, 0, 1, -1, -1, { "\\R{2}", NULL }, "\r\n\n" },
--
2.21.1

View file

@ -0,0 +1,144 @@
From 6f516ffef41280fbd9fd451fc7eab0c9ce98efad Mon Sep 17 00:00:00 2001
From: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Sun, 26 Jan 2020 15:31:27 +0000
Subject: [PATCH] Fix bug in processing (?(DEFINE)...) within lookbehind
assertions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1212 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
Signed-off-by: Petr Písař <ppisar@redhat.com>
---
src/pcre2_compile.c | 20 ++++++++++++++------
testdata/testinput1 | 13 +++++++++++++
testdata/testinput2 | 4 ++++
testdata/testoutput1 | 17 +++++++++++++++++
testdata/testoutput2 | 5 +++++
5 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/src/pcre2_compile.c b/src/pcre2_compile.c
index f2e6b6b..628503c 100644
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@@ -8836,9 +8836,10 @@ memset(slot + IMM2_SIZE + length, 0,
/* This function is called to skip parts of the parsed pattern when finding the
length of a lookbehind branch. It is called after (*ACCEPT) and (*FAIL) to find
-the end of the branch, it is called to skip over an internal lookaround, and it
-is also called to skip to the end of a class, during which it will never
-encounter nested groups (but there's no need to have special code for that).
+the end of the branch, it is called to skip over an internal lookaround or
+(DEFINE) group, and it is also called to skip to the end of a class, during
+which it will never encounter nested groups (but there's no need to have
+special code for that).
When called to find the end of a branch or group, pptr must point to the first
meta code inside the branch, not the branch-starting code. In other cases it
@@ -9316,14 +9317,21 @@ for (;; pptr++)
itemlength = grouplength;
break;
- /* Check nested groups - advance past the initial data for each type and
- then seek a fixed length with get_grouplength(). */
+ /* A (DEFINE) group is never obeyed inline and so it does not contribute to
+ the length of this branch. Skip from the following item to the next
+ unpaired ket. */
+
+ case META_COND_DEFINE:
+ pptr = parsed_skip(pptr + 1, PSKIP_KET);
+ break;
+
+ /* Check other nested groups - advance past the initial data for each type
+ and then seek a fixed length with get_grouplength(). */
case META_COND_NAME:
case META_COND_NUMBER:
case META_COND_RNAME:
case META_COND_RNUMBER:
- case META_COND_DEFINE:
pptr += 2 + SIZEOFFSET;
goto CHECK_GROUP;
diff --git a/testdata/testinput1 b/testdata/testinput1
index f5159d6..959d4b8 100644
--- a/testdata/testinput1
+++ b/testdata/testinput1
@@ -6386,4 +6386,17 @@ ef) x/x,mark
/^(?<A>a)(?(<A>)b)((?<=b).*)$/
abc
+"(?<=X(?(DEFINE)(A)))X(*F)"
+\= Expect no match
+ AXYZ
+
+"(?<=X(?(DEFINE)(A)))."
+ AXYZ
+
+"(?<=X(?(DEFINE)(.*))Y)."
+ AXYZ
+
+"(?<=X(?(DEFINE)(Y))(?1))."
+ AXYZ
+
# End of testinput1
diff --git a/testdata/testinput2 b/testdata/testinput2
index 655e519..7f70860 100644
--- a/testdata/testinput2
+++ b/testdata/testinput2
@@ -5772,4 +5772,8 @@ a)"xI
/(a)?a/I
manm
+# Expect non-fixed-length error
+
+"(?<=X(?(DEFINE)(.*))(?1))."
+
# End of testinput2
diff --git a/testdata/testoutput1 b/testdata/testoutput1
index ad2175b..dfb6366 100644
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@@ -10112,4 +10112,21 @@ No match
1: a
2: c
+"(?<=X(?(DEFINE)(A)))X(*F)"
+\= Expect no match
+ AXYZ
+No match
+
+"(?<=X(?(DEFINE)(A)))."
+ AXYZ
+ 0: Y
+
+"(?<=X(?(DEFINE)(.*))Y)."
+ AXYZ
+ 0: Z
+
+"(?<=X(?(DEFINE)(Y))(?1))."
+ AXYZ
+ 0: Z
+
# End of testinput1
diff --git a/testdata/testoutput2 b/testdata/testoutput2
index c733c12..69d1a7b 100644
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@@ -17435,6 +17435,11 @@ Subject length lower bound = 1
manm
0: a
+# Expect non-fixed-length error
+
+"(?<=X(?(DEFINE)(.*))(?1))."
+Failed: error 125 at offset 0: lookbehind assertion is not fixed length
+
# End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data
--
2.21.1

View file

@ -0,0 +1,55 @@
From a6749bb6c7c6fbfe849fb7e4e8dcf9d0e767d3e4 Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Mon, 10 Feb 2020 10:18:01 +0000
Subject: [PATCH] Fix control verb chain restoration issue in JIT.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1217 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_compile.c | 8 ++++----
src/pcre2_jit_test.c | 1 +
diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c
index 7874fac..10665a8 100644
--- a/src/pcre2_jit_compile.c
+++ b/src/pcre2_jit_compile.c
@@ -2693,8 +2693,8 @@ while (cc < ccend)
}
if (common->control_head_ptr != 0 && !control_head_found)
{
- shared_srcw[0] = common->control_head_ptr;
- shared_count = 1;
+ private_srcw[0] = common->control_head_ptr;
+ private_count = 1;
control_head_found = TRUE;
}
cc += 1 + 2 + cc[1];
@@ -2704,8 +2704,8 @@ while (cc < ccend)
SLJIT_ASSERT(common->control_head_ptr != 0);
if (!control_head_found)
{
- shared_srcw[0] = common->control_head_ptr;
- shared_count = 1;
+ private_srcw[0] = common->control_head_ptr;
+ private_count = 1;
control_head_found = TRUE;
}
cc++;
diff --git a/src/pcre2_jit_test.c b/src/pcre2_jit_test.c
index a188724..187e565 100644
--- a/src/pcre2_jit_test.c
+++ b/src/pcre2_jit_test.c
@@ -861,6 +861,7 @@ static struct regression_test_case regression_test_cases[] = {
{ MU, A, 0, 0 | F_NOMATCH, "(?(?=a)a(*THEN)b|ad)", "ad" },
{ MU, A, 0, 0, "(?!(?(?=a)ab|b(*THEN)d))bn|bnn", "bnn" },
{ MU, A, 0, 0, "(?=(*THEN: ))* ", " " },
+ { MU, A, 0, 0, "a(*THEN)(?R) |", "a" },
/* Recurse and control verbs. */
{ MU, A, 0, 0, "(a(*ACCEPT)b){0}a(?1)b", "aacaabb" },
--
2.21.1

View file

@ -0,0 +1,45 @@
From 75e399f77b5ffd82194b461e837a32cf48a5d970 Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Sat, 7 Dec 2019 16:00:53 +0000
Subject: [PATCH] Fix the too early access of the fields of a compiled pattern
in JIT.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1192 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_compile.c | 10 +++++-----
diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c
index 1d64536..303c68f 100644
--- a/src/pcre2_jit_compile.c
+++ b/src/pcre2_jit_compile.c
@@ -13742,11 +13742,6 @@ pcre2_jit_compile(pcre2_code *code, uint32_t options)
{
pcre2_real_code *re = (pcre2_real_code *)code;
-#ifdef SUPPORT_JIT
-executable_functions *functions = (executable_functions *)re->executable_jit;
-static int executable_allocator_is_working = 0;
-#endif
-
if (code == NULL)
return PCRE2_ERROR_NULL;
@@ -13779,6 +13774,11 @@ actions are needed:
avoid compiler warnings.
*/
+#ifdef SUPPORT_JIT
+executable_functions *functions = (executable_functions *)re->executable_jit;
+static int executable_allocator_is_working = 0;
+#endif
+
if ((options & PCRE2_JIT_INVALID_UTF) != 0)
{
if ((re->overall_options & PCRE2_MATCH_INVALID_UTF) == 0)
--
2.21.0

View file

@ -0,0 +1,117 @@
From b251f0bc17a4d5a3b3f7690432113c773bcbe13f Mon Sep 17 00:00:00 2001
From: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Mon, 27 Jan 2020 10:28:19 +0000
Subject: [PATCH] Limit function recursion in pcre2_study to avoid stack
overflow issues.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1213 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Port to 10.34.
---
src/pcre2_study.c | 31 ++++++++++++++++++++++---------
diff --git a/src/pcre2_study.c b/src/pcre2_study.c
index 2883868..5af01b5 100644
--- a/src/pcre2_study.c
+++ b/src/pcre2_study.c
@@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
- New API code Copyright (c) 2016-2019 University of Cambridge
+ New API code Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -58,7 +58,7 @@ collecting data (e.g. minimum matching length). */
/* Returns from set_start_bits() */
-enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE, SSB_UNKNOWN };
+enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE, SSB_UNKNOWN, SSB_TOODEEP };
/*************************************************
@@ -924,19 +924,24 @@ The SSB_CONTINUE return is useful for parenthesized groups in patterns such as
must continue at the outer level to find at least one mandatory code unit. At
the outermost level, this function fails unless the result is SSB_DONE.
+We restrict recursion (for nested groups) to 1000 to avoid stack overflow
+issues.
+
Arguments:
re points to the compiled regex block
code points to an expression
utf TRUE if in UTF mode
+ depthptr pointer to recurse depth
Returns: SSB_FAIL => Failed to find any starting code units
SSB_DONE => Found mandatory starting code units
SSB_CONTINUE => Found optional starting code units
SSB_UNKNOWN => Hit an unrecognized opcode
+ SSB_TOODEEP => Recursion is too deep
*/
static int
-set_start_bits(pcre2_real_code *re, PCRE2_SPTR code, BOOL utf)
+set_start_bits(pcre2_real_code *re, PCRE2_SPTR code, BOOL utf, int *depthptr)
{
uint32_t c;
int yield = SSB_DONE;
@@ -947,6 +952,9 @@ int table_limit = utf? 16:32;
int table_limit = 32;
#endif
+*depthptr += 1;
+if (*depthptr > 1000) return SSB_TOODEEP;
+
do
{
BOOL try_next = TRUE;
@@ -1103,13 +1111,17 @@ do
case OP_SCRIPT_RUN:
case OP_ASSERT:
case OP_ASSERT_NA:
- rc = set_start_bits(re, tcode, utf);
- if (rc == SSB_FAIL || rc == SSB_UNKNOWN) return rc;
- if (rc == SSB_DONE) try_next = FALSE; else
+ rc = set_start_bits(re, tcode, utf, depthptr);
+ if (rc == SSB_DONE)
+ {
+ try_next = FALSE;
+ }
+ else if (rc == SSB_CONTINUE)
{
do tcode += GET(tcode, 1); while (*tcode == OP_ALT);
tcode += 1 + LINK_SIZE;
}
+ else return rc; /* FAIL, UNKNOWN, or TOODEEP */
break;
/* If we hit ALT or KET, it means we haven't found anything mandatory in
@@ -1155,8 +1167,8 @@ do
case OP_BRAZERO:
case OP_BRAMINZERO:
case OP_BRAPOSZERO:
- rc = set_start_bits(re, ++tcode, utf);
- if (rc == SSB_FAIL || rc == SSB_UNKNOWN) return rc;
+ rc = set_start_bits(re, ++tcode, utf, depthptr);
+ if (rc == SSB_FAIL || rc == SSB_UNKNOWN || rc == SSB_TOODEEP) return rc;
do tcode += GET(tcode,1); while (*tcode == OP_ALT);
tcode += 1 + LINK_SIZE;
break;
@@ -1664,7 +1676,8 @@ code units. */
if ((re->flags & (PCRE2_FIRSTSET|PCRE2_STARTLINE)) == 0)
{
- int rc = set_start_bits(re, code, utf);
+ int depth = 0;
+ int rc = set_start_bits(re, code, utf, &depth);
if (rc == SSB_UNKNOWN) return 1;
/* If a list of starting code units was set up, scan the list to see if only
--
2.21.1

View file

@ -0,0 +1,33 @@
From 73417f882ac907a182e1491ead2eecb7c5e559cc Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Fri, 24 Jan 2020 08:28:23 +0000
Subject: [PATCH] The JIT stack should be freed when the low-level stack
allocation fails.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1207 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_misc.c | 5 +++++
diff --git a/src/pcre2_jit_misc.c b/src/pcre2_jit_misc.c
index efdb055..36abdba 100644
--- a/src/pcre2_jit_misc.c
+++ b/src/pcre2_jit_misc.c
@@ -145,6 +145,11 @@ maxsize = (maxsize + STACK_GROWTH_RATE - 1) & ~(STACK_GROWTH_RATE - 1);
jit_stack = PRIV(memctl_malloc)(sizeof(pcre2_real_jit_stack), (pcre2_memctl *)gcontext);
if (jit_stack == NULL) return NULL;
jit_stack->stack = sljit_allocate_stack(startsize, maxsize, &jit_stack->memctl);
+if (jit_stack->stack == NULL)
+ {
+ jit_stack->memctl.free(jit_stack, jit_stack->memctl.memory_data);
+ return NULL;
+ }
return jit_stack;
#endif
--
2.21.1

View file

@ -0,0 +1,44 @@
From 037a7a81a46898c61e780cd23feddbae73b87839 Mon Sep 17 00:00:00 2001
From: zherczeg <zherczeg@6239d852-aaf2-0410-a92c-79f79f948069>
Date: Thu, 28 Nov 2019 11:35:08 +0000
Subject: [PATCH] Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1190 6239d852-aaf2-0410-a92c-79f79f948069
Petr Písař: Ported to 10.34.
---
src/pcre2_jit_compile.c | 4 ++--
src/pcre2_jit_test.c | 1 +
diff --git a/src/pcre2_jit_compile.c b/src/pcre2_jit_compile.c
index f564127..1d64536 100644
--- a/src/pcre2_jit_compile.c
+++ b/src/pcre2_jit_compile.c
@@ -13122,8 +13122,8 @@ common->read_only_data_head = NULL;
common->fcc = tables + fcc_offset;
common->lcc = (sljit_sw)(tables + lcc_offset);
common->mode = mode;
-common->might_be_empty = re->minlength == 0;
-common->allow_empty_partial = (re->max_lookbehind > 0) || (re->flags & PCRE2_MATCH_EMPTY) != 0;
+common->might_be_empty = (re->minlength == 0) || (re->flags & PCRE2_MATCH_EMPTY);
+common->allow_empty_partial = (re->max_lookbehind > 0) || (re->flags & PCRE2_MATCH_EMPTY);
common->nltype = NLTYPE_FIXED;
switch(re->newline_convention)
{
diff --git a/src/pcre2_jit_test.c b/src/pcre2_jit_test.c
index a9b3880..e0638ef 100644
--- a/src/pcre2_jit_test.c
+++ b/src/pcre2_jit_test.c
@@ -638,6 +638,7 @@ static struct regression_test_case regression_test_cases[] = {
{ MU, A, 0, 0, "(?=(?:x|ab(*ACCEPT)b))", "ab" },
{ MU, A, 0, 0, "(?=(a(b(*ACCEPT)b)))a", "ab" },
{ MU, A, PCRE2_NOTEMPTY, 0, "(?=a*(*ACCEPT))c", "c" },
+ { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "(?=A)", "AB" },
/* Conditional blocks. */
{ MU, A, 0, 0, "(?(?=(a))a|b)+k", "ababbalbbadabak" },
--
2.21.0

View file

@ -8,14 +8,41 @@
Summary: Perl-compatible regular expression library
Name: pcre2
Version: 10.32
Release: 3
Version: 10.34
Release: 1
License: BSD
Group: System/Libraries
Url: http://www.pcre.org/
Source0: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/%{name}-%{version}.tar.bz2
Source0: https://ftp.pcre.org/pub/pcre/%{name}-%{version}.zip
# Do no set RPATH if libdir is not /usr/lib
Patch0: pcre2-10.10-Fix-multilib.patch
Patch0: pcre2-10.10-Fix-multilib.patch
# Fix JIT to respect NOTEMPTY options, upstream bug #2473,
# in upstream after 10.34
Patch1: pcre2-10.34-Use-PCRE2_MATCH_EMPTY-flag-to-detect-empty-matches-i.patch
# Fix a crash in pcre2_jit_compile when passing a NULL code argument,
# upstream bug #2487, in upstream after 10.34
Patch2: pcre2-10.34-Fix-the-too-early-access-of-the-fields-of-a-compiled.patch
# Fix a crash in JITted code when a *THEN verb is used in a lookahead assertion,
# upstream bug #2510, in upstream after 10.34
Patch3: pcre2-10.34-Fix-THEN-verbs-in-lookahead-assertions-in-JIT.patch
# Fix a memory leak when allocating a JIT stack fails, in upstream after 10.34
Patch4: pcre2-10.34-The-JIT-stack-should-be-freed-when-the-low-level-sta.patch
# Ensure a newline after the final line in a file is output by pcre2grep,
# upstream bug #2513, in upstream after 10.34
Patch5: pcre2-10.34-Ensure-a-newline-after-the-final-line-in-a-file-is-o.patch
# Fix processing (?(DEFINE)...) within look-behind assertions,
# in upstream after 10.34
Patch6: pcre2-10.34-Fix-bug-in-processing-DEFINE-.-within-lookbehind-ass.patch
# Prevent from a stack exhaustion when studying a pattern for nested groups by
# putting a limit of 1000 recursive calls, in upstream after 10.34
Patch7: pcre2-10.34-Limit-function-recursion-in-pcre2_study-to-avoid-sta.patch
# Fix restoring a verb chain list when exiting a JIT-compiled recursive
# function, in upstream after 10.34
Patch8: pcre2-10.34-Fix-control-verb-chain-restoration-issue-in-JIT.patch
# Fix a crash in JIT when an invalid UTF-8 character is encountered in
# match_invalid_utf mode, upstream bug #2529, in upstream after 10.34
Patch9: pcre2-10.34-Fix-a-crash-which-occurs-when-the-character-type-of-.patch
BuildRequires: readline-devel
%description
@ -128,8 +155,7 @@ Utilities demonstrating PCRE2 capabilities like pcre2grep or pcre2test.
#----------------------------------------------------------------------------
%prep
%setup -q
%patch0 -p1 -b .multilib
%autosetup -p1
%build
# Because of multilib patch